Studying Properties of Czech Complex Sentences from an Annotated Corpus

نویسندگان

  • Vladislav Kubon
  • Markéta Lopatková
چکیده

The paper deals with the problem of an analysis of complex sentences in Czech on the basis of manually annotated data. The availability of a specialized corpus explicitly describing mutual relationships between segments and clauses in Czech complex sentences, together with the availability of a thoroughly syntactically annotated corpus, the Prague Dependency Treebank, provide a solid background for linguistic investigation. The paper presents quantitative, linguistic and structural observations which provide a number of clues for building an algorithm for analyzing a structure of complex sentences in the future.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Annotated Corpus Outside Its Original Context: A Corpus-Based Exercise Book

We present the STYX system, which is designed as an electronic corpus-based exercise book of Czech morphology and syntax with sentences directly selected from the Prague Dependency Treebank, the largest annotated corpus of the Czech language. The exercise book offers complex sentence processing with respect to both morphological and syntactic phenomena, i. e. the exercises allow students of bas...

متن کامل

Segmentation of Complex Sentences

The paper describes a method of dividing complex sentences into segments, easily detectable and linguistically motivated units that may be subsequently combined into clauses and thus provide a structure of a complex sentence with regard to the mutual relationship of individual clauses. The method has been developed for Czech as a language representing languages with relatively high degree of wo...

متن کامل

Prague Dependency Treebank as an Exercise Book of Czech

There was simply linguistics at the beginning. During the years, linguistics has been accompanied by various attributes. For example corpus one. While a name corpus is relatively young in linguistics, its content related to a language collection of texts and speeches is nothing new at all. Speaking about corpus linguistics nowadays, we keep in mind collecting of language resources in an electro...

متن کامل

Aspect-Level Sentiment Analysis in Czech

This paper presents a pioneering research on aspect-level sentiment analysis in Czech. The main contribution of the paper is the newly created Czech aspectlevel sentiment corpus, based on data from restaurant reviews. We annotated the corpus with two variants of aspect-level sentiment – aspect terms and aspect categories. The corpus consists of 1,244 sentences and 1,824 annotated aspects and is...

متن کامل

Czech Legal Text Treebank 1.0

We introduce a new member of the family of Prague dependency treebanks. The Czech Legal Text Treebank 1.0 is a morphologically and syntactically annotated corpus of 1,128 sentences. The treebank contains texts from the legal domain, namely the documents from the Collection of Laws of the Czech Republic. Legal texts differ from other domains in several language phenomena influenced by rather hig...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011